{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Further Hypothesis Testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select this cell and type Ctrl-Enter to execute the code below.\n",
    "\n",
    "library(tidyverse)\n",
    "\n",
    "set_plot_dimensions <- function(width_choice, height_choice) {\n",
    "    options(repr.plot.width=width_choice, repr.plot.height=height_choice)\n",
    "}\n",
    "\n",
    "cbPal <- c(\"#E69F00\", \"#56B4E9\", \"#009E73\", \"#F0E442\", \"#CC79A7\", \"#0072B2\", \"#D55E00\")\n",
    "\n",
    "set_plot_dimensions(5, 4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You should see \"Attaching packages\" and some ticks by the packages loaded.\n",
    "# The \"Conflicts\" aren't a problem.\n",
    "\n",
    "# Other problems loading the library? Try running this cell.\n",
    "\n",
    "install.packages('tidyverse')\n",
    "\n",
    "library(tidyverse)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 - Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this workshop, you will apply a number of commonly encountered parametric and non-parametric tests to answer a variety of research questions about an example data set.\n",
    "\n",
    "We begin with a brief exploration of the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The file `stars.csv` contains a dataset of 240 stars, with five variables for each star:\n",
    "\n",
    "|variable | description|\n",
    "|:---|:---|\n",
    "|temperature | the surface temperature (K)|\n",
    "|luminosity | luminosity relative to sun|\n",
    "|radius | radius relative to sun|\n",
    "|spectral_class | the spectral class of each star (O,B,A,F,G,K or M)|\n",
    "|type | as defined below|\n",
    "\n",
    "\n",
    "The luminosity and radius of each star is calculated relative to that of the Sun:\n",
    "\n",
    "$L_{sun} = 3.83 \\times 10^{26}\\text{W}$\n",
    "\n",
    "$R_{sun} = 6.96 \\times 10^8\\text{m}$\n",
    "\n",
    "\n",
    "The stars are classified into 6 types:\n",
    "\n",
    "code | type\n",
    ":---|:---\n",
    "0 |Brown Dwarf\n",
    "1 |Red Dwarf\n",
    "2 |White Dwarf\n",
    "3 |Main Sequence\n",
    "4 |Supergiant\n",
    "5 |Hypergiant\n",
    "\n",
    "The dataset contains 40 examples of each type.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the data using the `read_csv` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data <- read_csv(\"stars.csv\")\n",
    "\n",
    "type_key <- c('Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant')\n",
    "spectral_classes <- c('O','B','A','F','G','K','M')\n",
    "\n",
    "data$type <- factor(data$type)\n",
    "data$spectral_class <- factor(data$spectral_class, levels=spectral_classes)\n",
    "\n",
    "head(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be using `ggplot2` to visualise distributions of these variables.\n",
    "\n",
    "For example, execute the following to see an overall histogram of luminosity:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ggplot(data, aes(x=luminosity)) + \n",
    "    geom_histogram(bins=50, alpha=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, more usefully, on a log scale:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ggplot(data, aes(x=log(luminosity))) + \n",
    "    geom_histogram(bins=50, alpha=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can split the data by another variable to compare different groups of stars. \n",
    "\n",
    "For example, the following box plot shows log(luminosity), grouped by type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ggplot(data, aes(x=type, y=log(luminosity), fill=type)) + \n",
    "    scale_fill_manual(values=cbPal) +\n",
    "    geom_boxplot(alpha=0.5) + \n",
    "    guides(fill=\"none\") +\n",
    "    coord_flip()                                          \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
